-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import course staff management command #255
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to put this draft out for initial review to get feedback and to make sure I'm on the right track. Also, I'm guessing for testing that I can just create a CSV and test locally that this populates the db in django admin?
edx_exams/apps/core/management/commands/bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
edx_exams/apps/core/management/commands/bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
edx_exams/apps/core/management/commands/bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
edx_exams/apps/core/management/commands/bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
LOGGER_NAME = 'edx_exams.apps.core.management.commands.bulk_add_course_staff' | ||
|
||
|
||
class TestBulkAddCourseStaff(TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be worth adding a test here that ensures this command produces the expected number or queries w/ assertNumQueries
. example:
with self.assertNumQueries(4): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense! When I used this method, I see that there are a number of calls: https://github.com/edx/edx-exams/actions/runs/8394389300/job/22991407617?pr=255. Is it better to just check relative number of queries (for example, that batch size affects the number of queries for example) or is there a better way to finetune? I couldn't figure out a good way to constrain to only a certain type of query (for example, just the bulk query). Was poking around here: https://docs.djangoproject.com/en/5.0/topics/testing/tools/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably just make this it's own test rather than baking it into the others as an extra assertion. For that one I'd try a bunch of extra rows to ensure the query count has not increased in an unexpected way based on more data.
For getting the expected number you should be able to walk though the code and count things up for what you expect. If it's different, it would be good to know why or where the extra query is coming from. There's a few ways to debug django queries but an easy one is to use
from django.db import connection
print(connection.queries)
0d5b026
to
901ebb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready for review! Added some specific questions about batch testing and also testing using assertNumQueries.
parser.add_argument( | ||
'--batch_size', | ||
type=int, | ||
default=10000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good default size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a default I think maybe we keep the number lower in the hundreds or 1k max. Not sure when we'd start running into memory issues with a really big bulk create query. You could play with this locally to get an estimate however.
edx_exams/apps/core/management/commands/bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
edx_exams/apps/core/management/commands/test/test_bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
edx_exams/apps/core/management/commands/test/test_bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still looking over things but a few comments to start
parser.add_argument( | ||
'--batch_size', | ||
type=int, | ||
default=10000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a default I think maybe we keep the number lower in the hundreds or 1k max. Not sure when we'd start running into memory issues with a really big bulk create query. You could play with this locally to get an estimate however.
edx_exams/apps/core/management/commands/bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
LOGGER_NAME = 'edx_exams.apps.core.management.commands.bulk_add_course_staff' | ||
|
||
|
||
class TestBulkAddCourseStaff(TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably just make this it's own test rather than baking it into the others as an extra assertion. For that one I'd try a bunch of extra rows to ensure the query count has not increased in an unexpected way based on more data.
For getting the expected number you should be able to walk though the code and count things up for what you expect. If it's different, it would be good to know why or where the extra query is coming from. There's a few ways to debug django queries but an easy one is to use
from django.db import connection
print(connection.queries)
csv = self._write_test_csv(csv, lines) | ||
with self.assertNumQueries(2): | ||
call_command(self.command, f'--csv_path={csv.name}') | ||
assert CourseStaffRole.objects.filter(course_id='course-v1:edx+test+f20').count() == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to test if the correct user/role was created here rather than assert any role exists. Same for the below tests
edx_exams/apps/core/management/commands/test/test_bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
38b57fe
to
b0a9515
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I left a few, hopefully quick, questions for you.
edx_exams/apps/core/management/commands/bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
""" | ||
Add the given set of course staff provided in csv | ||
""" | ||
reader = list(unicodecsv.DictReader(csv_file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What made you choose the unicodecsv
package? I ask because Python's native csv
module has a DictReader
class that should provide the same functionality. Looking at the docs for unicodecsv
, it looks like it improves upon some unicode issues in Python 2, but we're on Python 3 here, so I'd imagine we don't benefit from unicodecsv
. Where possible, I think it would be better to leverage native packages. I think this should be an easy swap without any impact to the command or its tests. What do you think?
As a nit, at least for the csv.DictReader
class, the class is iterable, so you don't have to cast the value to a list; you can just iterate the instance. Maybe unicodecsv.DictReader
is different? Oh, I see. You're casting it to a list because you're iterating reader
twice and don't want to exhaust the iterable, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point! I've changed to use csv. The only reason I used unicode was because I had referenced code from elsewhere... so not really a good reason haha.
I primarily cast to a list because I check the length of the reader (in order to batch the creates for the course staff). Lemme know if this is a bad idea though!
edx_exams/apps/core/management/commands/bulk_add_course_staff.py
Outdated
Show resolved
Hide resolved
for i in range(0, len(reader), batch_size): | ||
CourseStaffRole.objects.bulk_create( | ||
CourseStaffRole( | ||
user=User.objects.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the answer to this question, and maybe this is not possible. But I noticed that you create the users_existing
set querying the User
model via the username
field. Here, though, you're querying the User
model via the username
and the email
field. I'm wondering whether a scenario is possible where a user's would be found in users_existing
but not in this query.
If we're pulling this data from the LMS, I think this could happen, because email
is modifiable, but username
isn't. I think email
is maintained on our User
model via syncing with the JWT
when they hit edx-exams
in the process of an exam, right? Do you think this is something we need to address here? I'm just not sure this code guarantees a user will exist, which could cause an exception here.
Since the function is in a transaction, it should be okay, because it'll rollback, we can correct the CSV, and try again, but it would be better to handle explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, that's a really good point. Since the username is unique, could I just get by username instead? I've updated to be that way but let me know if that could cause any issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's good! username
should not change.
edx_exams/apps/core/management/commands/test/test_bulk_add_course_staff.py
Show resolved
Hide resolved
5c86af2
to
e4f5931
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thank you for incorporating my feedback.
87fdab0
to
195a50e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
JIRA: COSMO-200
Description: Create a management command to populate the CourseStaffRole model and any related User objects with data from the LMS. Once merged create an ESRE ticket to upload the CSV file and run the command.